Reinforcement Learning: Model-based
نویسنده
چکیده
Reinforcement learning (RL) refers to a wide range of dierent learning algorithms for improving a behavioral policy on the basis of numerical reward signals that serve as feedback. In its basic form, reinforcement learning bears striking resemblance to ‘operant conditioning’ in psychology and animal learning: actions that are rewarded tend to occur more frequently; actions that are punished are less likely to be repeated. Because of its simplicity, reinforcement learning is applicable to an incredibly wide range of dierent problems. Unfortunately, also because of its simplicity, RL often exhibits extremely slow learning in complex problems. An ongoing challenge in the field of machine learning is developing learning algorithms that share the generality and elegance of RL, but learn in a more ‘intelligent’ and sophisticated manner. This note considers one particular approach to extending the capabilities of RL algorithms, known as model-based reinforcement learning. Model-based RL is also of significant interest to researchers in cognitive science, as it is believed to more closely correspond to the type of learning that humans routinely demonstrate. Inmodel-based RL, as in the basic model-free approach, the primary goal of learning is the improvement of a behavioral policy in order to maximize a numerical reward signal. However, the approach differs frommodel-free RL in that simultaneously, the agent attempts to learn a model of its environment. Having such a model is a useful entity: it allows the agent to predict the consequences of actions before they are taken, allowing the agent to generate virtual experience, as well as perform mental search through a problem space to locate an ecient solution. Thus, model-based RL integrates both learning on the basis of past experience, and planning future actions. Further, a model of the environment is not directly tied to the task one is currently performing. For example, consider a mouse learning to navigate a maze to find a food reward. While performing this task, the mouse could acquire a mental representation of the structure of the maze. If the mouse’s goals change, for example, from finding food to finding a safe place to hide, the previously learned policy (the sequence of turns to locate the food reward) is no help, but the learned model of the maze structure could still aid the mouse in achieving its new goal. The downside ofmodel-based RL is that acquiring amodel in the first place adds to the overall
منابع مشابه
Reinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic
In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...
متن کاملOperation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm
: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...
متن کاملUsing BELBIC based optimal controller for omni-directional threewheel robots model identified by LOLIMOT
In this paper, an intelligent controller is applied to control omni-directional robots motion. First, the dynamics of the three wheel robots, as a nonlinear plant with considerable uncertainties, is identified using an efficient algorithm of training, named LoLiMoT. Then, an intelligent controller based on brain emotional learning algorithm is applied to the identified model. This emotional l...
متن کاملDynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)
In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...
متن کاملReinforcement Learning Based PID Control of Wind Energy Conversion Systems
In this paper an adaptive PID controller for Wind Energy Conversion Systems (WECS) has been developed. Theadaptation technique applied to this controller is based on Reinforcement Learning (RL) theory. Nonlinearcharacteristics of wind variations as plant input, wind turbine structure and generator operational behaviordemand for high quality adaptive controller to ensure both robust stability an...
متن کاملOutsourcing or Insourcing of Transportation System Evaluation Using Intelligent Agents Approach
Nowadays, outsourcing is viewed as a trade strategy and organizations tend to adopt new strategies to achieve competitive advantages in the current world of business. focusing on main copmpetencies, and transferring most of activities to outside resources of organization( outsourcing) is one such strategy is. In this paper, we aim to decide on decision maker agent of transportation system, by a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012